First, we split the dataset in three sub-sets (S1, S2 and S3) for further analysis.
dt_s1_orig <- as.data.frame(na.omit(dt[, 2:99], invert=FALSE));
dt_s2_orig <- as.data.frame(dt[1:nrow(dt_s1_orig), 100:456]);
dt_s3_orig <- as.data.frame(1+dt[nrow(dt_s2_orig):nrow(dt), 100:456]);
We computed the same statistic measures for Sections S1, S2 and S3 separately.
The analysis of S1, being the most relevant output at this point, delivered the following output:
head(statistics_s1)
## Station Mean Sd Min Sec_qrt Median Fourth_qrt Max
## 1: ACME 16877462 7869606 12000 11404200 16946400 23734800 31347900
## 2: ADAX 16237534 7905850 510000 10611000 16299300 23027400 31227000
## 3: ALTU 17119189 7702989 900 11674500 17073600 23903700 31411500
## 4: APAC 17010565 7883455 3300 11637000 17062500 23909400 31616100
## 5: ARNE 17560173 7917965 477300 11666400 17578500 24503700 32645700
## 6: BEAV 17612143 7911267 300 11493600 17520900 24683100 32884800
As mentioned before, we conducted the same analysis for S2 and S3 as well, resulting in datatables with the same respective structure.
To identify outliers in the dataset, we used a 1.5 * IQR benchmark for Max and Min and we ran a for-loop.
This identified that in the S1 dataset there are 0 outliers within our defined benchmark. We therefore did not see the need to adjust the dataset.
In this section, we first mapped the stations using Leaflet, next we plotted the statistical analysis output from above.
We clustered the stations by production mean:
head(blue)
## Station Mean
## 1: ALTU 17119189
## 2: ARNE 17560173
## 3: BEAV 17612143
## 4: BESS 17304074
## 5: BOIS 18688943
## 6: BUFF 17304512
nrow(blue)
## [1] 23
Selected Example for further analysis: GOOD
head(green)
## Station Mean
## 1: ACME 16877462
## 2: ADAX 16237534
## 3: BIXB 15969634
## 4: BLAC 16061707
## 5: BOWL 16034081
## 6: BREC 16655129
nrow(green)
## [1] 47
Selected Example for further analysis: ACME
head(red)
## Station Mean
## 1: CLAY 15486166
## 2: CLOU 15656934
## 3: COOK 15667280
## 4: COPA 15896897
## 5: EUFA 15718914
## 6: IDAB 15849510
nrow(red)
## [1] 22
Selected Example for further analysis: STIG
We first plotted all solar panels according to their daily production means. We highlighted high- and low-performers by labelling them accordingly with their station name.
Next, we also visualized the production standard deviations of each station. Again, we highlighted high- and low-performers by labelling them accordingly with their station name.
We furthermore scaled the data:
tipify <- function(x){
mu <- mean(x, na.rm = TRUE);
s <- sd(x, na.rm = TRUE);
s[s == 0] <- 1;
x <- (x - mu) / s;
}
And we plotted the resulting densities:
We analyzed a) correlations between solar panels and respective PCAs and b) correlations between explanatory variables themsevles.
We identified PCA 1 to be the most relevant explanatory variable for 97 of the 98 solar panels.
The exception was station #60, MTHE, with PCA 2 as most relevant correlation.
head(corr_main)
## Station Corr_PCA
## 1: ACME 1
## 2: ADAX 1
## 3: ALTU 1
## 4: APAC 1
## 5: ARNE 1
## 6: BEAV 1
max(max_corr_per_PCA);
## [1] 0.07248046
The highest absolute value of correlation coeficient of each PCA to other PCAs is 0,072. Not enough evidence of correlation using this approach to make some PCAs redundant.
To get a better understanding of above-conducted analysis we visualized the findings.
We used the following code to conduct the Dimensionality reduction:
select_important <- function(dat, n_vars, y){
varimp <- filterVarImp(x = dat, y=y, nonpara=TRUE);
varimp <- data.table(variable=rownames(varimp),imp=varimp[, 1]);
varimp <- varimp[order(-imp)];
selected <- varimp$variable[1:n_vars];
return(selected);
};
We conducted our analysis for three solar panels (Blue: GOOD, Green: ACME, Red: STIG) representing the three groups identified in section [1.2]. As the caclulation requires large amounts of CPU and memory (and therefore time), we post the output here instead.
[1] “PCA1” “PCA2” “PCA4” “PCA5” “PCA3” “PCA6” “PCA7” “PCA24” “PCA9” “PCA32”
[1] “PCA1” “PCA2” “PCA7” “PCA4” “PCA6” “PCA5” “PCA24” “PCA32” “PCA26” “PCA35”
[1] “PCA1” “PCA2” “PCA7” “PCA4” “PCA5” “PCA6” “PCA24” “PCA26” “PCA32” “PCA33”
Group E - Datadores